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Abstract 

The  Markov  chain  simulation  method  has  become  a  powerful  computa¬ 
tional  method  in  Bayesian  analysis.  The  success  of  this  method  depends 
on  the  convergence  of  the  Markov  chain  to  its  stationary  distribution.  We 
give  two  carefully  stated  theorems,  whose  conditions  are  easy  to  verify,  that 
establish  this  convergence.  We  give  versions  of  our  conditions  which  are  sim¬ 
pler  to  verify  for  the  Markov  chains  that  arise  most  commonly  in  Bayesian 
analysis. 

Key  words  and  phrases:  Bayesian  Poisson  regression;  calculation  of  posterior  dis¬ 
tributions;  ergodic  theorem;  Markov  chain  simulation  method. 

1  Introduction 

Let  TT  be  a  probability  distribution  on  a  measurable  space  The  Monte 

Carlo  Markov  chain  method  is  a  technique  for  estimating  characteristics  of  tt  such 
as  7r(E)  or  J  /dir  where  E  £  B  and  /  is  a  bounded  measurable  function,  and 
which  is  useful  when  rr  is  too  complex  to  describe  analytically.  The  idea  is  very 

’Research  supported  by  National  Science  Foundation  Grant  DMS-92-04938 
^Research  supported  by  Air  Force  Office  of  Scientific  Research  Grant  90-0202 
f  Research  supported  by  Army  Rc.search  Office  Grant  DAAL0.'!-90-G-0103 


straightforward.  We  construct  a  transition  prohabiiity  funftlun  wstii  the 

property  that  it  ha.s  stationary  distribution  r,  i.e. 


J  P{TXn^{il-r) 


Then,  we  generate  a  Markov  chain  {A,,}  with  this  Iran.silion  incjbability  fuju  tioi) 
as  follows.  We  fix  a  starting  point  Xq,  generate  an  ub.servalion  Ad  from 
generate  an  observation  X>  from  /^(A,--)-  etc.  'riiis  i)!oduces  the  .Markos  chain 
.To  =  AT- A'l,  Ab, .  ■  V\’e  use  this  construction  in  one  of  two  ways.  Lither  we 
discard  an  initial  segment  Ay.  A’j ,  AT. . . . ,  Ad  of  the  .Markov  chain,  in  wiiicjj  t  he 
chain  has  not  yet  converged  to  its  stationary  distribution,  and  rel.iiu  tin;  rest  of 
the  chain,  or  we  independently  run  a  large  number  of  chains  and  for  ('acb  la  tain 
ojily  the  last  observation.  In  either  case  we  we  use  the  observations  that  we  have 


kept  to  obtain  empirical  estimates  of  those  fealur<“s  of  tt  that  arc  of  interest. 

Implicit  in  this  method  is  the  assumption  that  the  chain  converges  to  its  sta¬ 
tionary  distribution,  for  a  wide  class  of  starling  points  Xq.  Inch.-ed.  one  can  easily 
give  examples  of  Markov  chains  that  do  not  converge  to  theii'  stationary  disinbu- 
tion  from  any  starting  point.  Thus,  to  establish  the  validity  of  the  method,  it  is 
crucial  to  obtain  results  that  gi^  e  conditions  which  imply  coin  ergencc  of  the  chaiTi. 

The  Markov  chain  literature  already  contains  many  results  that  give  conditions 
under  which  the  Markov  chain  converges  to  its  stationary  distribution  for  a  clas.s 
of  starting  points  xo  >vhich  have  probability  one  under  -  (this  condition  is  called 
ergodicity).  Unfortunately,  when  one  comes  to  apply  these  res\ilts.  one  immediately 
notices  that  in  5^flff.s/fca/ applications,  the  conditions  of  these  theorems  are  ^•ir1na!ly 
impossible  to  check. 

In  our  work  we  iiavc  obtained  two  theorems  (Theorems  1  and  2  below)  that 
assert  ergodicity  of  the  chain  under  conditions  tlial  are  extremely  eas}'  to  verify  in 
a  wide  range  of  problems  that  are  likely  to  arise  in  Bayesian  statistics.  Tliese  the¬ 
orems  pertain,  roughly,  to  the  two  modes  of  using  the  Markov  chain  construction. 
Before  explaining  our  theorems,  it  is  useful  to  give  an  idea  of  the  wide  scope  of  the 
problems  that  can  be  approached  via  the  Monte  Carlo  Markov  chain  method. 

There  are  many  ways  to  produce  a  transition  function  satisfying  (1.1).  .Mc’th- 
ods  include  the  Metropolis  algorithm  and  its  variants,  and  the  so-called  Gibbs 
sampler.  This  last  method  appears  to  be  the  one  that  is  the  most  widely  used 
in  Bayesian  statistics,  and  we  now  proceed  to  describe  it.  Idiis  algorithm  is 
used  to  estimate  the  unknown  joint  distribution  tt  =  ~a'!"  (possi¬ 

bly  vector- valued)  random  variables  (A''^^>  . . . ,  A'l’’*)  by  updating  the  coordinates 
one  at  a  time,  as  follows.  W'e  suppose  that  we  know  the  conditional  distiibu- 
tions  i  —  1, ...  or  at  least  that  we  are  able  to  generate  observa¬ 


tions  from  these  conditional  distributions.  If  A'r,, 
state,  the  next  state  ATi+i  ■-  (  A^'+i— •  • , A''^,f|]) 
as  follows.  Generate  A'i’ji  from 


=  (A'^’\  ....  A"^f*)  is  the  current 
of  the  .Markov  diaiii  is  fornicd 
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is  g<MH'ralod  froin 


=  +  . ‘‘»d  SO  Oil  Ulllil  .Yil'!., 

^A-oOj{.v(j)  . If  F  is  tiie  Iransilion  funriion  that  produces 

from  Xm,  then  it  is  easy  to  see  that  P  satisfies  (1.1). 

We  now  give  a  very  brief  description  of  how  this  method  is  useful  in  some 
Bayesian  problems.  We  suppose  that  the  [laranieter  0  has  .some  prior  distribu! ion. 
that  we  observe  a  data  point  V’  whose  conditional  distribution  given  0  is  £(  V  j  ()}. 
and  that  w'e  wish  to  obtain  C{()  |  V’),  the  conditional  distribution  of  0  given  V,  It  is 
often  the  case  that  if  wc  consider  an  (unob.servable)  auxiliary  random  variable  Z . 
then  the  distribution  ~o  ^  =  C{0,  Z\\']  has  the  property  that  (=  C{0  {  V,  Z)) 
and  TT^ie  (=  C{Z  \  y\0))  are  easy  to  calculate.  Typical  examples  are  missing  and 
censored  data  problems.  If  we  have  a  conjugate  family  of  prior  distributions  on 
then  we  may  take  Z  to  be  the  missing  or  the  censored  observations,  so  that 
TTfli;?  is  easy  to  calculate.  The  Gibbs  sampler  then  gives  a  random  ohservation  with 
distribution  (approximately)  C{0,  Z  \  Y).  and  retaining  the  first  coordinate  gives 
an  observation  with  distribution  (approximately)  equal  to  C{0  IV). 

•Another  application  arises  when  the  parameter  0  is  high  dimensional,  and  we 
are  in  a  nonconjugate  situation.  Let  us  write  0  —  {Oi, . . .  ,0^),  so  that  what  we  wish 
to  obtain  is  Direct  calculation  of  the  posterior  will  involve  the  evaluation 

of  a  ^-dimensional  integral,  w'hich  ma\'  be  difficult  to  accomplish.  On  the  other 
hand,  application  of  the  Gibbs  sampler  involves  the  generation  of  one-dimensional 
random  variables  from  '!^e,\{Sj  The  generation  of  random  variables  from  a  one- 
dimensional  distribution  is  in  general  much  ea.sier  than  from  a  multidimen.sional 
distribution;  very  often  special  tricks  can  be  used.  We  illustrate  this  with  an  ex¬ 
ample  in  Section  2  below.  In  addition,  we  note  that  the  distribution  is 

available  in  closed  form,  except  for  a  normalizing  constant.  There  exist  very  effi¬ 
cient  algorithms  for  generating  random  variables  from  such  a  distribution,  provided 
the  distribution  is  unimodat;  sec  Zaman  (1992). 

2  Illustration  of  the  Markov  Chain  Simulation 
Method;  Bayesian  Poisson  Regression 

.As  a  typical  application  of  how  the  Gibbs  sampler  helps  in  high  dimensional  prob¬ 
lems,  we  consider  a  model  involving  Bayesian  Poisson  regression.  Thi.s  model  is 

y;  ~  Poisson(A,),  Ai  =  '  =  T2, . . .  ,n, 

where  the  Xij's  are  non-negative  covariates,  and  where  tlie  prior  distribution  on 
the  /3j’s  is  a  product  of  Gammas.  In  this  case,  the  likelihood  function  is 

P(A)  =  IIexp(-A,)^ 

1=1  P'- 
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Ti  V 


{yi'-yi'-  -  yn')  ’cxpf  H  ( 

^  .=1  j=i  ^  .=1  '■  j=i 


and  the  joint  density  of  the  is  given  by 

P 


=<  <=>^p(  Il't’'  ■ 

^  ;=l  ^  3  =  1 


where  aj  is  the  shape  and  bj  is  the  scale  parameter  for  the  distribution  of  dj. 
j  —  1,2, .. .  ,p.  Tlie  posterior  joint  density  of  the  ffj's,  given  tiie  data,  is  tlu  refure 


T{i3uB2,...,l3p)  oc  exp 


where  ty  =  bj  +  rc.y,  j  =  1,2, ...  ,77.  To  determine  the  posterior  joint  density 
of  the  /3j’s  exactly,  the  constant  of  proportionality  needs  to  be  determined.  This 
requires  high-dimensional  integration.  However,  the  Gibbs  sampler  can  be  used  if 
■  ,’e  know  the  conditional  distributions  of  any  Bt  gi^’cn  the  rest  of  the  P'^s  and  the 
data. 

To  compute  the  conditional  density  of  any  pk.  k  =  1,2, ...  ,p,  given  the  rest 
of  the  pj's  and  the  data,  we  proceed  as  follow.  F'or  each  h  1  <  /  <  p.  let  Si  — 
{1,2, . . .  ,p}\  {!}.  Then  for  each  k,  the  density  of  0k,  conditional  on  all  0j,  j  e  Sk, 
and  the  data  is  the  discrete  mixture  of  Gamma  densities 

fM3:,jiS,{0k)  OC  01“''  expi-Vk0k)  n(c,  -h  x.kPkY', 

1=1 


where  c,  =  Let  m.  =  y,  and  write 

n  m 

n(c,  +  XihPkf'  = 

1=1  (=0 

where  we  explicitly  show  the  dependence  of  the  coefficients  on  k.  Then, 

lMS,.,isXBu]  <x  exp(.-«iA) 

and  we  readily  recognize  that 

m 

f 0k\0}<}^Sh{0k)  —  '^^Pl{k)9ak+l,Vk{Bk)-, 

1=0 


where  gp,q{x)  denotes  the  gamma  density  with  shape  parameter  p  and  scale  param¬ 
eter  q  in  X,  and  pi{k)  =  ri{k)r{ak  +  "Lhe  p/{L)’s  are  the  discrete  mixture 

probabilities. 


4 


3  Convergence  Theorems 

Before  SLaling  our  theorenis.  wo  will  need  a  f<’W  dofiiiil ions  ronoornino  Markxv 
chains.  Let  (ionolc  the  dislribulion  of  .V„  wlieii  tin-  chain  i,'-  slarled  at 

.Also,  for  a  .set  C  G  B.  let  7’(C)  ==  inf{/(  :  it  >  0,  A’,;  t  C}  In*  tin'  lust  Uiue 
the  chain  hits  C,  after  time  0.  Finally,  for  any  snb.set  I  of  the  positive  hitegers, 
g.c.d.(r)  will  denote  the  greatest  coninion  divisor  of  the  integers  in  I. 

Theorem  1  Sujjpo.se  that  the  Markov  chain  {A'„}  with  tjan.sifion  Ainctioji  /''(.r.C) 
has  an  invariant  proha hiJity  measure  i.e.  (LI)  holds.  Suppose  that  there  is  a  set 
A  E.  B,  a  probability  measure  p  with  p{A)  —  1,  a  constant  t  >  U.  and  an  inteprr 
??()  >  1  such  that 

-{j-  ;  JUT(A]  <  oo)  >  O}  =  1.  (.'LH 

and  ^  ^ 

/^"“(.T.-)  >  cp[-)  for  each  x  E  .A.  t'L2) 

Suppose  further  that 

g.c.d.\m  >  1  :  there  is  an  tm  >  0  such  that  sup  •)  >  c,np(-)}  ~  1-  ('1-3) 

Then  there  is  a  set  Dq  such  that 

7r(Do)  =  1  and  sup  \P’'{x,C)  —  0  for  each  x  G  Do-  (3.4) 

C'€i' 


Theorem  2  Suppose  that  the  Markov  chain  {A’„}  with  transition  function  l^i-r.  C) 
satisfies  conditions  (LI),  (3.1)  and  (3.2).  Then 

I  1  I 

sup  —  C)  —  7r(C)  — >  0  as  m  —*  oc  for  [TTj-aiinost  all  x,  (3.5) 

^0  r=:0 

and  hence 

1  n 

sup  |— Tl  P^{x,C)  —  ~(C')|  — >  0  a.s  n  oc  for  [;r]-a/nms(  all  x.  (3.G) 

C€y j=i 

Let  f{x)  be  a  measurable  function  on  {X,B)  such  that  f  <  oc.  Then 

-iZfiYj)  I  ■^(3ij)J{y)\  =  1  for  [-j-a/mosi  all  x  (3.7) 

t71.  J  J 

and 

1  "  r 

—  Y'  Ea{f{Xj))  I  ‘x(dy) f{y)  —  1  for  [7r]-almost  all  x.  (3.8) 

n  ' 


o 


Theorciii  1  requires  coiidiliou  (3.d),  while  riicorein  2  does  not.,  d  heoreni  2 
states  that  if  condition  (3.3)  is  violated,  one  can  still  apply  the  Markov  chain 
simulation  method,  except  that  one  has  to  work  with  avcraites  of  dependent  random 
variabies  instead  of  running  a  large  number  of  independent  chains  and  working 
with  an  (approximately)  independent  sample.  'i’he.se  two  tlieorems  are  proved  in 
Athrcya.  Doss,  and  Sethuraman  (1992).  where  it  is  also  shown  that  th(?se  arc  tlie 
weakest  possible  conditions  that  will  ensure  convergence  of  a  Markov  chain  ior  a 
set  of  starting  points  having  probability  one  under  the  stationary  distribution. 

There  are  already  many  theorems  that  give  conditions  that  guarantee  ergod- 
icity  of  Markov  chains.  See  the  discussion  in  Section  1  of  .Athreva.  Doss,  and 
Sethuraman  (1992).  Most  of  these  theorems  are  .stated  under  two  general  class(.-s 
of  conditions.  Conditions  in  the  first  cla.ss  involve  verilicalion  of  a  “recurrence 
condition"  which  is  much  stronger  than  our  condition  (3.1).  Conditions  i.u  the  sec¬ 
ond  class  of  involve  the  stationary  distribution  of  the  chain.  Since  this  stationary 
distribution  is  unknown,  these  conditions  are  difficult  to  verify.  In  contrast,  our 
theorems  are  slated  under  conditions  that  involve  only  the  transition  function,  and 
thus  are,  in  general,  easier  to  verify. 

Theorems  1  and  2  pertain  to  arbitrary  Markov  chains.  As  we  mentioned  earlier, 
the  Gibbs  sampler  is  the  most  commonly  used  Markov  chain  in  Bayesian  statistics. 
We  now'  give  a  result  that  facilitates  the  use  of  our  theorems  when  the  Markov  chain 
used  is  the  Gibbs  sampler.  We  use  the  notation  of  Section  1,  and  assume  that  for 
each  i.  the  conditional  distributions  densities,  say  Px,|{.vo> 

with  respect  to  some  dominating  measure  p,. 

Theorem  3  Suppose  that  for  each  i  =  1,  —  k  there  is  a  set  A,  with  p,(.4,)  >  0. 
and  a  6  >  0  such  that  for  each  i  =  1, . . . ,  A: 

>0  (3.9) 

whenever 

G  A\, . . .  G  A,,  and  x*'"''^*, . . .  are  arbitrary, 

and 

PxMxin  •  •  •  ,2:*'^^)  >  6  whenever  x'^*  G  ^  1 . .  , . .  A:. 

Then  conditions  (3.1)  and  (3.2)  are  .satisfied  with  Uq  =  1.  Thus,  (3-3)  is  also 
satisfied,  and  the  conclusions  of  Theorems  1  and  2  bold. 

We  note  that  condition  (3.9)  is  often  checked  for  all  x^’^, . . . 

This  theorem  is  immediate  for  the  case  k  =  2.  For  the  general  case  the  proof 
follows  by  induction. 
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